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1.  Introduction 


In  the  last  year,  the  author  has  been  pursuing  ideas  intended  to  liberate  information  theory  as  much  as  possible  from 
the  equations  and  inequalities  which  make  it  a  very  difficult  subject  to  apply  in  practice.  Algebraic,  geometric  and  order 
theoretic  ideas  have  been  discovered  which  make  it  much  easier  to  study  the  process  of  communication.  This  paper  is  about 
the  use  of  another  subject  not  traditionally  used  in  information  theory,  topology,  and  some  of  the  problems  that  it  has  played 
a  key  role  in  solving.  Conversely,  we  will  also  see  that  information  theoretic  ideas  may  hold  a  place  in  topology. 

Topological  ideas  have  proven  themselves  important  in  information  theory  lately:  in  reducing  the  threat  of  covert 
communication  [6],  in  studying  the  limiting  behavior  of  the  capacity  achieving  distribution  [5],  in  comparing  channel 
behavior  [6].  In  one  manner  or  another,  all  of  these  applications  require  knowing  that  the  amount  of  information  one  can 
get  through  a  channel  varies  continuously  as  a  function  of  noise.  Thus  far,  however,  this  continuity  has  only  been  verified 
in  the  case  of  binary  channels  [6].  In  this  paper,  we  prove  that  the  capacity  of  a  channel  with  any  number  of  inputs  and  any 
number  of  outputs  is  continuous,  and  in  fact,  we  show  that  this  also  holds  for  timing  channels  with  arbitrary  inputs  and 
outputs.  The  result  provides  a  rigorous  mathematical  foundation  for  results  in  [5]  and  is  also  used  to  do  new  things,  such  as 
characterize  when  the  equation  arising  in  the  capacity  reduction  problem  is  solvable  for  general  timing  channels.  So  we  see 
topology  as  a  useful  tool  in  information  theory. 

Unexpectedly,  though,  we  have  also  recently  discovered  that  ideas  from  information  theory  have  relevance  in  topology: 
the  most  important  topology  in  mathematics,  the  Euclidean  topology,  is  determined  by  channel  capacity.  This  follows  from 
the  more  fundamental  fact  that  capacity  is  a  Lebesgue  measurement  on  the  interval  domain,  a  result  whose  proof  reveals 
a  profound  connection  between  the  study  of  measurement  in  domain  theory,  and  the  remarkable  result  of  Majani  and 
Rumsey  in  information  theory  [2].  The  fact  that  capacity  is  a  measurement  makes  it  possible  to  reason  about  capacity  by 
only  examining  probabilities  in  the  noise  matrix  of  a  channel.  We  illustrate  the  power  of  this  idea  by  studying  the  effect  that 
amplitude  damping  has  on  communication  with  qubits:  connecting  quantum  information’s  intuitive  use  of  the  word  ‘noise’ 
to  information  theory’s  more  rigorous  use  of  it. 
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2.  Topology  in  information  theory 

2.1.  The  Euclidean  continuity  of  capacity 

In  this  section,  we  establish  a  continuity  principle  for  information  theory  that  explains  why  the  largest  amount  of 
information  one  can  transmit  through  a  communication  channel  varies  continuously  as  a  function  of  noise.  We  will  formulate 
the  principle  for  functions  of  the  form/  :  X  xY  Z  where  Z  is  a  bicontinuous  poset  andX  and  Y  are  spaces  in  anticipation 
of  future  work  in  relativity  [7].  Let  us  now  review  the  definition  of  a  bicontinuous  poset. 

Definition  2.1.  Let  (P,  Q  be  a  partially  ordered  set.  A  nonempty  subset  S  c  Pis  directed  if  (Vx,  y  £  S)(3z  e  S)  x,y  c  z. 
The  supremum  of  S  c  p  is  the  least  of  all  its  upper  bounds  provided  it  exists.  This  is  written  |J  S. 

Dually,  a  nonempty  5  c  P  is  filtered  if  (Vx,  y  e  S)(3z  e  S)  z  ^  x,  y.  The  infimum  [\  S  of  5  c  P  is  the  greatest  of  all  its  lower 
bounds  provided  it  exists. 

Definition  2.2.  For  a  subset  X  of  a  poset  P,  set 

fX:={yeP  :  (3x  £  X)x  c  y}  &  /X  :=  {y  £  P  :  (3x  £  X)y  c  x}. 

We  write  fx  =  f{x]  and  fx  =  f  {x}  for  elements  x  £  X. 

Definition  2.3.  For  elements  x,  y  of  a  poset,  write  x  <^y  iff  for  all  directed  sets  S  with  a  supremum, 
y  c  |  1 5  ==>  (3s  £  5)  X  c  s. 

We  set  ^x  =  {a  e  P  :  a  x}  and  fx  =  {a  £  P  :  x  a}. 

For  the  symbol  read  “approximates”. 

Definition  2.4.  A  basis  for  a  poset  P  is  a  subset  B  such  that  B  D  ^x  contains  a  directed  set  with  supremum  x  for  all  x  £  P. 
A  poset  is  continuous  if  it  has  a  basis.  A  poset  is  ^-continuous  if  it  has  a  countable  basis. 

A  possible  objection  to  our  notion  of  ‘approximation’  is  that  it  is  biased  toward  suprema;  one  could  similarly  define  a 
form  of  approximation  in  terms  of  infima.  All  notions  of  bias  are  removed  when  we  require  that  these  notions  coincide. 

Definition  2.5.  A  continuous  poset  P  is  bicontinuous  if 

•  For  all  x,  y  £  P,  x  y  iff  for  all  filtered  5  c  P  with  an  infimum, 

5  c  x  =>•  (3s  £  5)  s  C  y, 
and 

•  For  each  x  £  P,  the  set  f  x  is  filtered  with  infimum  x. 

Examples  of  bicontinuous  posets  are  M  and  Q  with  the  usual  order. 

Definition  2.6.  On  a  bicontinuous  poset  P,  sets  of  the  form 

(a,  b)  :=  [x  £  P  :  a  x  b] 
form  a  basis  for  a  topology  called  the  interval  topology. 

That  the  open  intervals  (a,  b )  form  a  basis  for  a  topology  follows  from  bicontinuity  and  the  fact  that  continuous  posets  are 
interpolate:  if  x  y  in  a  continuous  poset  P,  then  there  is  z  £  P  with  x  z  <£;  y.  On  a  bicontinuous  poset  P,  the  interval 
topology  is  Hausdorff,  and  <  is  a  closed  subset  of  P2.  The  sets  f  x  and  fx  are  closed  while  the  sets  fx  and  ^x  are  open. 

Lemma  2.7.  Iff  :  X  Y  is  a  continuous  function  from  a  compact  space  to  a  bicontinuous  poset  and  the  image  off  is  directed, 
then 

f(w)  =  |J/(x) 

xeX 

for  some  w  eX. 

Proof.  First,  we  need  to  show  that  the  indicated  supremum  exists.  The  set  I<  =  f(X)  is  a  compact  subset  of  Y  by  the 
continuity  off.  By  assumption,  I<  is  directed.  For  each  t  £  I(,  the  set 

I<t  :=  f  t  H  K 

is  nonempty  because  it  contains  t  and  compact  because  it  is  a  closed  subset  of  I(,  using  that  ft  is  a  closed  subset  of  Y.  By  the 
directedness  of  I<,  the  collection  {I<t  :  t  £  I<}  is  a  filtered  intersection  of  nonempty  compact  subsets  of  I<.  By  the  compactness 
of/C, 

P|  I<t  #  0- 

teK 

Let  u  be  a  point  in  this  intersection.  Then  t  <  u  for  all  teK  and  u  e  I<.  Thus,  u  =  \_\I<.  Because  u  £  I< ,  there  is  w  £  X  with 
/  (w)  =  u.  □ 
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We  should  point  out  that  a  compact  set  in  a  bicontinuous  poset  need  not  be  directed.  For  instance,  take  the  disjoint  union 
y  =  (0,  1/2)  U  (1/2,  1)  ordered  so  that  elements  compare  in  the  usual  order  only  when  they  belong  to  the  same  set.  Then 
{1/4,  3/4}  is  a  compact  subset  with  no  upper  bound.  Notice  that  Y  is  even  globally  hyperbolic.1 

By  taking  Y  =  M,  we  obtain  the  well  known  fact  from  calculus  that  a  continuous  real  valued  function  on  a  compact  set 
assumes  a  maximum  value. 

Definition  2.8.  If/  :  X  x  Y  ->  Z  is  a  function  that  maps  into  a  poset  Z,  we  say  that  its  image  is  directed  in  its  second  variable 
if /({x}  x  Y)  is  a  directed  subset  of  Z  for  each  x  e  X. 

Theorem  2.9.  LetX  and  Y  be  spaces  with  Y  compact,  and  letZ  denote  a  bicontinuous  poset.  Iff:XxY^Zisa  continuous 
function  whose  image  is  directed  in  its  second  variable,  thenf*  :  X  ->  Z  given  by 

/*(*)  =  |_|/(x,y) 

yeY 

is  a  continuous  function. 

Proof.  For  a  fixed  x,  the  function  fx  :  Y  ->  Z  given  by  fx(y)  =  f(x,  y)  is  continuous,  as  the  restriction  of  the  continuous/, 
and  has  a  compact  domain  since  {x}  x  y  is  the  product  of  compact  spaces.  By  Lemma  2.7,/*  is  a  well-defined  function. 

Suppose  now  that/*  (x)  e  (a,  b).  Because  |_|/({x}  x  y)  actually  belongs  to/({x}  x  y),  as  shown  in  the  previous  argument, 
we  know  that  for  some  v  e  Y,f*(x)  =  /(x,  v).  By  the  continuity  off,  there  are  open  sets  (JCX,Ky  with  (x,  v)  e  U  x  V 
such  that 

fix,  v)  ef(U  x  V)  c  (a,  b ) 

Thus,  if  we  have  any  t  e  U,  then 

a«f(t,v)<\Jf(t,y)=f*(t) 

yeY 

which  means  f*(t)  e  fa  when  t  e  U.  Now  we  want  to  find  an  open  set  W  c  X  with  x  e  W  and  f*(t)  e  ^ b  for  t  e  W.  If 
we  do,  then  we  will  have/*(t)  e  f  a  D  ^.b  =  (a,  b )  for  all  t  e  U  Pi  W,  which  will  establish  the  continuity  of/*  since  the  sets 
(a,  b)  are  a  basis  for  the  interval  topology  on  Z. 

Let  I  =  {[/  c  X  :  U  is  open  &x  e  U}.  By  way  of  contradiction,  we  can  assume  that 

(Vi  e  /)(3x,  €  0/({x/}  xy)n(Z\|b)/0 

(Otherwise,  there  is  i  e  I  with/({t}  x  y)  c  lb  for  all  t  e  i,  and  since  we  always  know  that /*(t)  e/({t}  x  y),  the 
proof  would  be  finished.)  By  Lemma  2.7,  there  is  Wi  e  Y  with  /*(x2)  =f(x if  wf).  The  set  I  is  a  directed  poset  under  the 
order  U  <  V  V  c  U.  Thus,  by  the  compactness  of  y,  the  net2  ( )  has  a  subnet  specified  by  a  directed  subset  J  of 
I  which  converges  to  a  point  w  e  Y.  Because  the  net  (x2)  converges  to  x,  the  subnet  (x,)  also  converges  to  x.  Then  since 
/(Xj,  wf)  e  Z  \  ^b  for  all  j,  the  continuity  off  and  the  fact  thatZ  \  ^b  is  closed  give 

hm/(Xj,  wj)  =/(x,  w)eZ\  ^b. 

This  contradicts /(x,  w)  <  /*(x)  b.  Then  there  is  an  open  set  W  with  /*(£)  b  for  all  t  eW.  □ 

This  result  can  be  summarized  by  saying  “if /  is  directed  in  a  compact  variable,  then/*  is  continuous”.  One  particular 
application  of  this  theorem,  the  one  we’ll  be  concerned  with  in  the  present  work,  is  the  case  when  Z  =  R: 

Corollary  2.10.  Iff  :  X  x  Y  R  is  a  continuous  function  with  Y  compact,  then 

/*(x)  =  sup/ (x,  y) 

yer 

defines  a  continuous  function  from  X  into  M. 

A  nice  aspect  of  bicontinuous  posets  is  that  the  order  dual  of  Z,  the  poset  Z*,  is  also  bicontinuous.  Thus,  the  previous 
results  hold  if  sup  and  directed  are  replaced  by  inf  and  filtered.  Now  we  come  to  the  importance  of  this  result  in  information 
theory. 

Let  u  denote  the  noise  matrix  for  an  (m,  n)  channel/  i.e.  a  channel  with  m  inputs  and  n  outputs.  Then  u  =  (u\ , . . . ,  um)  is 
a  vector  of  classical  states,  each  u{  e  An  \=  (x  e  [0,  l]n  :  J2xi  =  1}  being  a  row  of  the  matrix  u.  If  the  inputs  to  the  channel 
/  :  Am  — An  are  distributed  as  x  e  Am,  then  the  outputs  distribute  as 

f(x)  :=  x  -  u  e  An. 


1  A  poset  (X,  <)  is  globally  hyperbolic  if  it  is  bicontinuous  and  if  the  sets  [a,  b]  :=  {x  e  X  :  a  <  x  <  b}  are  compact  in  the  interval  topology. 

2  Those  unfamiliar  with  the  basic  properties  of  nets  will  find  them  discussed  in  the  Appendix. 
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Given  times  t  =  (tu  ■  ■  ■ ,  tn ),  the  mutual  information  is  the  function  It  :  Am  R  given  by 


too  = 


H(f(x))~ZZ 


t-fOO 

where  H  is  the  base  two  entropy 


m 

HOO  =  -  ^x,  log2(x,  ). 

i=l 

The  capacity  of  an  (m,  n)  timing  channel  with  times  t  =  (U , . . . ,  tn)  is  then  the  function 

m 

Ct :  f~[  An  -*  E 
2=1 


given  by 

Ct(u)  =  sup  It(u,  x) 

xeAm 

where  It(u,  x)  explicitly  denotes  the  dependence  of  It  on  the  noise  matrix  u.  This  is  called  the  timed  capacity.  If  the  times 
satisfy  /•  =  1  for  all  i,  the  channel/  is  called  untimed ,  and  in  that  case,  we  abbreviate  the  capacity  to  C(u).  Untimed  channels, 
which  are  what  people  most  often  study  in  the  literature,  amount  to  saying  that  the  cost  of  sending  every  symbol  is  the 
same;  the  motivation  for  studying  general  timing  channels  is  that  the  cost  of  sending  a  symbol  in  practice  can  easily  depend 
on  the  symbol  sent. 

Theorem  2.11.  Timed  capacity  is  a  Euclidean  continuous  function  of  noise. 

Proof.  The  function  It  :  (n™i  2\n)  x  Am  K  given  by 

H(f(x))  -  £r=i  XjH(ui) 

I,(u,x )  =  - ! - 

t-m 

is  continuous  and  defined  on  the  product  of  two  compact  spaces.  By  Corollary  2.10,  the  function  Cf  :  Yl?=\  2\n  ->  M  is  also 
continuous.  □ 


2.2.  Discontinuity  of  the  capacity  achieving  distribution 


We  have  taken  an  abstract  approach  to  establishing  the  continuity  of  timed  capacity,  so  abstract  in  fact,  that  it  may 
mislead  the  reader  into  believing  that  the  result  is  simple.  However,  when  looked  at  from  an  applied  perspective,  which  we 
now  do,  it  is  a  nontrivial  fact  that  timed  capacity  is  continuous.  It  is  important  to  realize  that  not  all  essential  quantities  in 
information  theory  are  continuous.  For  instance,  the  capacity  achieving  distribution  x(a,  b )  of  a  (2,  2)  timing  channel  has 
no  continuous  extension  to  the  unit  square  -  and  there  is  irony  in  this  last  remark:  the  discontinuity  of  x(a,  b)  follows  from 
the  continuity  of  timed  capacity!  Let  us  now  explain  this  remarkable  fact. 

In  the  case  of  a  (2,  2)  timing  channel,  the  noise  matrix  u  is 


u  = 


where  a  =  P(0|0),  a  \=  1  -  a  =  P(1|0),  b  =  P(0|1)  and  b  :=  1  —  b  =  P(l|l).Thus,  we  can  think  of  u  as  really  being  a  pair 
of  probabilities  (a,  b).  In  [5],  it  is  shown  that  the  capacity  Ct  (a,  b)  of  a  (2,  2)  timing  channel  (a,  b)  with  times  t  =  (U,  t2) 
and  a  /  b  is  given  by 

H(a)(b-l)  +  H(b)(l-a)  \ 

- - - - - In  (f(x)) 

(a-b)  ) 

where  x  is  the  unique  number  in  [0,  1]  that  satisfies 

e-K/(a-b)(f(x))t2  -  (1  -f(x)Y'  =  0, 

with  f(x)  =  (a  -  b)x  +  b,I<  =  (be  -  t2)H(a)  +  (t2  —  ae)H(b),  e  = 

H(x)  =  —x  ln(x)  -  (1  -  x)  ln(l  -  x). 

The  capacity  of  a  (2,  2)  timing  channel  is  zero  iff  a  =  b.  This  characterization  of  timed  capacity  is  needed  in  practice  so 
that  we  know  how  to  calculate  it.  From  the  applied  perspective  then,  it  is  nontrivial  that  timed  capacity  varies  continuously 
with  a  and  b  on  the  entire  unit  square:  the  expression  for  Ct  contains  a  1  /(a  —  b)  term,  so  its  behavior  as  we  approach  the 


Ct(a,  b) 


ln(2) 


1 

ti 


t2  —  t\  and  H  the  base  e  entropy 
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diagonal  {(x,  x)  :  x  e  [0,  1]}  is  a  priori  uncertain.  Let  us  now  consider  the  capacity  achieving  distribution  x.  By  solving  for  x 
as  a  function  of  a  and  b , 


x(a,  b )  =  — •  (efi<a'b)  -  b ) , 
a  -  fa  v  7 


(1) 


where 


0(a, 


H(a)(b  —  1)  +  H(fa)(l  —  a) 
a  -  b 


Ct (a,  fa)  ti  In  2. 


By  the  continuity  of  timed  capacity,  the  input  distribution  which  achieves  capacity  varies  continuously  as  a  function  of  a 
and  b  on  {(a,  fa)  e  [0,  1]  x  [0,  1]  :  a  /  fa}.  The  authors  of  [5]  made  use  of  the  continuity  of  timed  capacity  since  they  both 
knew  the  present  author  had  shown  but  not  published  Theorem  2.1 1.  Assuming  the  continuity  of  capacity,  they  were  then 
able  to  prove: 


Theorem  2.12.  The  function  x  has  no  continuous  extension  to  the  entire  unit  square. 


The  proof  proceeds  by  showing  that  neither 


lim  x(a,  fa)  nor  lim  x(a,  fa) 

(a,b)-K  0+,0+)  (a,b)-Kl-,l-) 


exists.  The  notation  (a,  b )  ->  (0+,  0+)  means  that  we  are  free  to  approach  the  origin  along  any  path  provided  we  are  within 
the  unit  square;  the  notation  (a,  fa)  ->  (1“,  1“)  says  the  same  of  (1,  1).  They  then  establish  the  following  four  limits: 


•  lima^0+  x(a,  0)  =  1/e 

•  lim&^0+  x(0,  fa)  =  1  —  1/e 

•  lim^!-  x(l,  fa)  =  1  —  1/e 

•  lim^!-  x(a,  1)  =  1/e 


To  give  the  reader  a  feel  for  how  it  can  be  possible  that  1/e  and  1  —  1/e  always  arise  as  limits,  independent  of  the  values 
taken  on  by  t\  and  t2,  let  us  calculate  the  first  one:  from  Eq.  (1)  we  have, 

x(a,  0)  =  (1  —  a)(1-a)/a  •  e-Ct(a’0)fi ln2. 

Because  Ct  is  a  continuous  function  of  (a,  fa)  e  [0,  1}  x  [0,  1],  as  a  ->  0+,  Ct (a,  0)  Ct(0,  0)  =  0.  Thus,  x(a,  0)  is  the 

product  of  two  functions  which  have  limits  as  a  ->  0+,  which  means  that  x  has  a  limit  as  a  ->  0+.  This  limit  is 

( 1  —  n\  Va  1  1 

lim  x(a,  0)  =  lim  (1  -  a)(1_a)/“  •  lim  e“Ct(a'0,tl  ln2  =  lim  - - - - 1  =  -  •  1  = 

a-*0+  a-^0+  a^0+  a^0+  1  —  a  e  e 

And  so  x  cannot  be  continuously  extended  to  the  entire  unit  square.  In  the  case  of  capacity,  the  l/(a  —  fa)  term  does  not 
prevent  a  continuous  extension  to  the  unit  square,  but  in  the  case  of  the  actual  distribution  x(a,  fa)  that  achieves  capacity,  it 
does.  The  discontinuity  of  x(a,  fa)  becomes  even  more  disturbing  when  looked  at  from  the  practical  viewpoint. 

Suppose  that  we  have  a  timing  channel  with  noise  matrix  u  and  capacity  achieving  distribution  x.  If  we  vary  u  ever  so 
slightly  in  the  sense  of  Euclidean  distance,  thereby  obtaining  a  new  channel  with  matrix  u£,  it  would  seem  reasonable  that 
the  original  distribution  x  would  be  a  good  approximation  to  the  capacity  achieving  distribution  x£  for  the  new  channel  u£. 
It  is  exactly  the  fact  that  x  is  not  continuously  extendible  to  the  unit  square  which  proves  that  this  is  not  true!  For  instance, 
two  positive  capacity  channels  can  have  noise  matrices  as  close  as  one  likes,  and  yet  their  respective  capacity  achieving 
distributions  can  be  nearly  a  maximum  distance  apart  [5].  Moreover,  the  kind  of  channels  we  encounter  in  a  neighborhood 
of  (0,  0)  occur  naturally  in  practice  [8]. 

Finally,  the  numbers  1  /e  and  1  —  1/e  are  not  coincidental;  we  will  see  them  again  in  the  discussion  on  measurement. 


2.3.  Solvability  of  the  capacity  equation  for  timing  channels 

In  [6],  the  capacity  reduction  problem  for  untimed  binary  channels  was  shown  to  have  a  canonical  solution  by  exploiting 
algebraic  structure  of  channels  in  place  of  a  second  equation.  In  this  problem,  one  is  trying  to  solve  the  equation  C(u)  =  cp 
for  a  noise  matrix  u  given  a  desired  capacity  cp.  More  generally,  one  can  ask  when  it  is  that  this  equation  can  be  solved  for 
any  (n,  n)  timing  channel,  we  now  solve  this  problem.  Continuity  even  offers  an  approach  to  obtaining  canonical  solutions: 
let  us  suppose  we  measure  the  amount  of  noise  in  the  channel  by 

m 

h(u)  =  j2h(u  o. 

i= 1 

Theorem  2.13.  Let  u  be  the  noise  matrix  of  an  (n,  n)  timing  channel  with  times  t  =  (ti, . . . ,  tn)  and  go  >  1  be  the  unique 
positive  solution  o/^"=1  x-ti  =  1.  Then 

(i)  The  equation  Ct  (u)  =  cp  has  a  solution  iffO  <  cp  <  log<u 

(ii)  Ifcp  e  [0,  log&>],  then  there  is  a  noise  matrix  u  which  not  only  satisfies  Ct(u)  =  cp  but  which  also  minimizes  H.  There  is  also 
a  noise  matrix  u  which  maximizes  H  and  solves  Ct  (u)  =  cp. 
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Proof.  Each  part  of  this  proof  depends  crucially  on  the  fact  that  capacity  is  continuous. 

(i)  The  mutual  information  is  always  nonnegative  and  bounded  from  above  by  the  noiseless  version  of  our  channel: 


0  <  It(u,  x)  < 


H(f(x)) 
t-m ' 


Then  we  have 


0  <  c(u)  <  sup 

xeAn 


t-m 


H(y)  ^  H(x) 
sup  -  <  sup - =  log  CO 

yef(An)  t  ‘  y  xeAn  t  '  X 


where  the  equality  on  the  right  is  due  to  Shannon  [9].  Thus,  if  we  have  a  solution  of  Ct(u)  =  cp,  then  cp  e  [0,  logo;].  Now 
suppose  that  we  have  any  cp  e  [0,  log  co\.  Then  for  a  noise  matrix  v  =  (v\, . . . ,  iq)  whose  rows  are  identical,  and  the  identity 
matrix  w  =  (e  i, . . . ,  en)  which  corresponds  to  the  noiseless  version  of  our  channel,  we  have 


Ct(v)  =  0  <  cp  <  log co  =  Ct (w). 

Since  c  is  continuous  and  the  space  of  noise  matrices  is  also  connected,  there  must  be  u  with  Ct(u)  =  cp. 

(ii)  By  the  continuity  of  Ct,  the  set  of  solutions 

S  =  {u  :  Ct(u )  =  Cp}  =  C-’CfCp}) 

is  a  closed  subset  of  the  compact  space  of  noise  matrices.  Thus,  S  itself  is  compact.  H  is  continuous  on  a  compact  set  so  it 
assumes  both  a  maximum  and  a  minimum.  □ 


Matrices  can  be  chosen  to  maximize  or  minimize  any  continuous  quantity,  not  just  entropy. 


3.  Information  theory  in  topology 


3. 1.  Capacity  as  a  measurement  on  the  interval  domain 

Recalling  the  definitions  from  Section  2.1,  we  now  briefly  review  domains  and  measurements. 

Definition  3.1.  A  dcpo  is  a  poset  in  which  every  directed  set  has  a  supremum.  A  domain  is  a  continuous  dcpo. 

The  interval  domain  over  [0,  1]  is  the  set  of  compact  subintervals  of  the  unit  interval 
I[0, 1]  =  {[a,  b]  :  a,  b  e  [0,  1]  &  a  <  b} 
ordered  by  reverse  inclusion 

[a,  hi  c  [c,d]  [c,d]  c  [a,  b\. 

The  poset  I[0,  1]  is  a  continuous  dcpo  where 

•  For  directed  S  c  I[0,  11,  \JS  = 

•  x  «  y  O  y  c  int(x),  and 

Notice  that  int(x)  refers  to  the  interior  of  the  interval  x  in  its  relative  Euclidean  topology,  so  that  int[a,  hi  =  (a,  h)  for  a  >  0 
and  h  <  1,  while  intfO,  hi  =  [0,  h)  for  h  <  1  and  int[a,  1]  =  (a,  11  for  a  >  0. 

Definition  3.2.  The  Scott  topology  on  a  continuous  dcpo  D  has  as  a  basis  all  sets  of  the  form  f  x  for  xeD. 

Example  3.3.  A  basic  Scott  open  set  in  I[0,  11  is 
f  [a,  hi  =  {x  e  I[0,  1]  :  x  c  int([a,  h])}. 

In  the  lower  half  of  the  unit  square,  such  a  set  forms  a  right  triangle  whose  hypotenuse  lies  along  the  diagonal,  but  whose 
other  two  sides  are  removed. 

A  function/  :  D  — ►  E  between  domains  is  Scott  continuous  if  the  inverse  image  of  a  Scott  open  set  in  E  is  Scott  open  in 
D.  This  is  equivalent  to  saying  that /  is  monotone , 

(Vx,yeD)xCy^/(x)  c/(y), 

and  that  it  preserves  directed  suprema : 

/(US)  =  U^S)’ 

for  all  directed  S  c  D.  In  particular,  for  the  domain  [0,  oo)*  of  nonnegative  reals  in  their  opposite  order,  it  can  be  shown 
that  a  function  //  :  I[0,  11  ->  [0,  oo)*  is  Scott  continuous  iff 

(1)  For  allx,  y  e  I[0,  11,  x  c  y  =>  ptx  >  /zy,  and 
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(2)  If  (xn)  is  an  increasing  sequence  in  I[0,  1],  then 


=  lim  fixn. 

n^oo 


This  is  the  case  of  Scott  continuity  that  we  are  most  interested  in  presently: 

Definition  3.4.  A  Scott  continuous  fi  :  D  ->  [0,  oo)*  is  said  to  measure  the  content  ofx  e  D  if  for  all  Scott  open  sets  [/CD, 
x  e  U  =>►  (Ete  >  0)x  e  fi£(x)  c  U 
where 


Me(x)  :=  {y  e  D  :  y  ^  x&\fix  -  fiy\  <  s} 
are  called  the  ^-approximations  of  x. 

We  often  refer  to  fi  as  simply  ‘measuring’  x  e  D  or  as  measuring  X  c  D  when  it  measures  each  element  of  X.  The  last 
definition,  as  well  as  the  next,  easily  extend  to  maps  fi  that  take  values  in  an  arbitrary  domain  E. 

Definition  3.5.  A  measurement  fi  :  D  [0,  oo)*  is  a  Scott  continuous  map  that  measures  the  content  of  ker(/x)  :=  {x  e 
D  \  fix  =  0}. 

The  order  on  a  domain  D  defines  a  clear  sense  in  which  one  object  has  ‘more  information’  than  another:  a  qualitative 
view  of  information  content.  The  definition  of  measurement  attempts  to  identify  those  monotone  mappings  fi  which  offer 
a  quantitative  measure  of  information  content  in  the  sense  specified  by  the  order.  The  essential  point  in  the  definition  of 
measurement  is  that  fi  measure  content  in  a  manner  that  is  consistent  with  the  particular  view  offered  by  the  order.  There 
are  plenty  of  monotone  mappings  that  are  not  measurements  -  and  while  some  of  them  may  measure  information  content 
in  some  other  sense ,  each  sense  must  first  be  specified  by  a  different  information  order.  The  definition  of  measurement  can 
then  be  seen  as  a  minimal  test  that  a  function  fi  must  pass  if  we  are  to  regard  it  as  providing  a  measure  of  information 
content. 

An  explicit  formula  for  the  capacity  [5]  of  an  untimed  (2,  2)  channel  is 


C(a,  b)  =  log2 


(i aH(b)-bH(a )  bH(a)-aH(b ) 

2  a~b  +2 


where  C(a,  a)  :=  0,  H(x)  =  —xlog2(x)  —  (1  —  x)  log2(l  —  x)  is  the  base  two  entropy  and  x  \=  1  —  x.  Capacity  can  be 
regarded  a  function  on  the  interval  domain  I[0,  1]  by  setting  C[a,  hi  :=  C(a,  b )  =  C(b,  a).  In  [6],  it  is  shown  that  capacity 
C  :  I[0,  1]  ->  [0,  1]*  is  Scott  continuous  from  the  interval  domain  to  the  unit  interval  in  its  dual  order  i.e.  capacity  decreases 
as  we  move  up  in  the  order  on  intervals.  What  was  far  from  clear  at  the  time  though  was  whether  or  not  capacity  was  a 
measurement.  We  will  now  prove  that  it  is.  Our  proof  turns  on  a  profound  connection  between  the  study  of  measurement 
in  domain  theory,  and  the  stunning  result  of  Majani  and  Rumsey  from  information  theory: 


Theorem  3.6  (Majani  &  Rumsey).  The  capacity  achieving  distribution  x  of  an  untimed  (2,  n)  channel  with  positive  capacity 
satisfies  1/e  <  x  <  1  —  1/e. 

To  see  what  this  result  says  in  the  case  we  are  interested  in,  let  A  =  H(a^(b) .  Then  the  capacity  achieving  distribution 
for  an  untimed  (2,  2)  channel  is  given  by 


x(a,  b) 


1 

a  —  b 


( — 

\  1  +  2^ 


(2) 


Thus,  by  the  result  of  Majani  and  Rumsey,  the  quantity  x(a,  b)  always  lies  in  the  open  interval  (1/e,  1  —  1/e)  whenever 
(a,  b)  has  positive  capacity  i.e.  when  a  and  b  are  not  equal.  This  remarkable  fact  was  first  observed  by  Silverman  in  1955, 
but  without  proof.  It  is  a  beautiful  result,  and  quite  mysterious.  Its  beauty  is  capable  of  obscuring  the  fact  that  it  can  also  be 
a  useful  tool  for  problem  solving.  In  our  case,  it  is  the  key  observation  needed: 

Lemma  3.7.  For  any  binary  channel  (a,  b )  e  [0,  l]2, 


C(a,  b)  > 


(a  ~  bf  > 

e2  ln(2)  ~~ 


(a  -  bf 
6 


Proof.  First,  let  (a,  b)  be  a  positive  capacity  channel.  Then  a  /  b.  By  the  statement  of  the  lemma  and  the  symmetry  of 
capacity,  we  can  assume  a  >  b.  Consider  the  base  two  mutual  information 


I(x)  =  H(f(x))  -  xH(a )  -  (1  -  x)H(b). 

Notice  that  I'(x)  =  H'(fx)(a  -  b)  —  (H(a)  -  H(b))  and  I"(x)  =  H"(fx)(a  -  b)2,  forx  e  (0,  1).  Letx(a,  b)  denote  the  unique 
capacity  achieving  distribution  for  (a,  b). 
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(i)  For  the  unique  capacity  achieving  distribution  x(a,  b ),  I'(x(a,  b ))  =  0,  which  can  be  verified  by  direct  substitution  if  one 
wishes,  since  we  have  a  formula  for  it, 

(ii)  The  point  x(a,  b)  is  in  (0,  1):  since  1(0)  =  0  and  /( 1)  =  0,  x(a ,  b)  =  0  or  x(a ,  b)  =  1  would  imply  that  C(a,  b )  = 
I(x(a,  b))  =  0, 

(iii)  I"  <  0on(0,  1),  which  follows  from  ln(2)  -H"(t)  =  — 1  /t(l  — t)  <  0,  which  itself  follows  from  H'(t)  =  log2((l  —  t)/t). 
Then  /'  >  0  on  (0,  x(a,  b))  /  0,  so  let  x  e  (0,  x(a,  b))  be  any  point  where  I'(x )  >  0.  By  the  mean  value  theorem, 

(3c  €  (0,  x))I(x)  -  1(0)  =  I(x)  -  0  =  I(x)  =  I'(c)(x  -  0). 

Because  I"  <  0, 1'  is  strictly  decreasing,  so  I' (c)  >  I'(x)  and  thus 
C(a,  b)  >  I(x)  >  x  •  I'(x)  =  x  •  (H'(fx)(a  -  b)  -  (Ha  -  Hb)). 

Now  notice  that 

H(a)  —  H(b)  =  (a  —  b)H'(f  (x(a,  b))). 

Thus,  our  lower  bound  on  capacity  can  be  rewritten  as 
C(a,  b)  >  x(a  -  b)[H'(fx)  -  H'(f(x(a ,  b)))]  >  0. 

The  function /(x)  =  (a  —  b)x  +  b  is  strictly  increasing  since  a  >  b,  so  f(x)  <  f(x(a,  b))f  and  we  can  once  again  apply  the 
mean  value  theorem  to  H'(fx)  —  H'(f(x(a ,  b)))  on  the  interval  \f(x),f(x(a ,  b))]  to  obtain  d  e  (f(x),f(x(a ,  b)))  such  that 

H'(fx)  -  H'(f(x(a ,  b)))  =  H"(d)(fx  —f(x(a,  b)))  =  H"(d)(a  -  b)(x  -  x(a,  b)). 

Our  lower  bound  on  capacity  is  now 


C(a,  b)  >  x(a  -  b)2(x  -  x(a,  b))H"(d)  = 


x(a  -  b)2(x(a,  b)  -  x) 


d(  1  -  d)  ln(2) 
The  absolute  minimum  of  l/t(l  —  t)  on  (0,  1)  is  4,  so 


>  0. 


C(a,  b)  > 


4 (a  -  b)2(x(a ,  b)  -  x)x 

M2) 


>  0. 


Now  we  choose  the  value  of  x  that  maximizes  (x(a,  b)  —  x)x,  which  is  the  midpoint  x  =  x(a ,  b) /2  e  (0,  x(a,  b)),  giving  us  a 
lower  bound  on  capacity  of 

x(a,  b)2(a  —  b)2 

C(a,  b)  >  -  >  0. 

ln(2) 

By  the  result  of  Majani  and  Rumsey,  we  know  that  x(a,  b)  >  1  /e,  so 

r,  (a-b)2 

C^’  V  >  2.  ns  >  °- 
e2  ln(2) 

Since  we  trivially  have  equality  when  a  =  b,  the  proof  is  finished.  □ 

Recall  that  capacity  can  be  regarded  as  a  map  on  the  interval  domain  by  setting  C[a,  hi  =  C(a,  b). 

Theorem  3.8.  The  capacity  C  :  I[0, 1]  — >-  [0,  1]*  is  a  measurement. 

Proof.  The  length  function  /z[a,  b]  =  b  —  a  is  a  measurement,  and  so  is  /z2,  since  it  arises  as  the  composition  of  pt  and  the 
isomorphism  t  i->  t2  on  [0,  oo).  Any  positive  multiple  of  a  measurement  is  another,  so  /z2/6  is  a  measurement.  Finally, 
because  C  is  Scott  continuous  [6]  and  satisfies  C  >  /z2/6  by  the  last  lemma,  C  is  also  a  measurement.  □ 

In  fact,  the  proof  above  also  shows  that  capacity  is  a  Lebesgue  measurement.  Recall  that  Lebesgue  measurements  are 
the  measurements  which  extend  to  the  convex  powerdomain,  capture  metrizability  on  continuous  posets  and  complete 
metrizability  on  continuous  domains  [4].  Interestingly,  capacity  is  a  naturally  occurring  example  of  a  Lebesgue  measurement 
that  does  not  satisfy  any  of  the  following  domain  theoretic  variants  of  the  triangle  inequality, 


•  For  all  x,  y  with  an  upperbound,  there  is  z  ^  x,  y  with  ptz  <  fix  +  /zy, 

•  For  all  x,  y  with  an  upperbound,  there  is  z  Cx,y  with  /zz  <  2  •  max{/zx,  /zy}, 


as  the  following  example  shows: 

Example  3.9.  Let  x  =  [0,  1  /2]  and  y  =  [1/2,  1].  Then  x  and  y  have  a  common  upper  bound,  but  since 
C(x)  =  log2(5/4)  &  C(y)  =  log2(5/4) 

there  does  not  exist  z  Cx,y  satisfying  either  of  the  triangle  inequality  variants  since  for  the  only  possible  z  =  [0,  11,  we  get 
C(z)  =  1  i  C(x)  +  C(y)  =  2  •  max{C(x),  C(y)}  =  log2(25/16)  <  log2(32/16)  =  log2(2)  =  1. 

This  also  shows  that  capacity  does  not  satisfy  the  triangle  inequality  as  a  function  from  [0,  l]2  to  [0,  1]. 
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Because  capacity  violates  the  triangle  inequality,  it  is  not  obvious  that  its  s  balls  form  a  basis  for  a  topology;  this  fact 
follows  from  the  fact  that  capacity  is  a  measurement  and  that  measurements  always  yield  a  basis  for  the  Scott  topology  on 
their  kernel.  In  more  detail,  given  a  measurement  pt  :  D  [0,  oo)*  on  a  domain  D  with  a  least  element,  we  can  define  a 
function  d  :  D2  ->►  [0,  oo)*  as 

d(x,  y)  =  inf {/zz  :  z  n.  x,y}. 

By  [3],  the  relative  Scott  topology  on  ker  /z  =  {x  e  D  :  fix  =  0}  has  as  a  basis  sets  of  the  form 
B£(x)  =  {y  e  ker  /z  :  d(x,y)  <  s} 

where  x  e  ker  /z  and  s  >  0.  If  we  take  D  =  I[0,  1]  and  /z  to  be  capacity  C  on  I[0,  1],  then 
d([a],  [b])  =  C([a]  n  [b])  =  C[a,  b] 

assuming  a  <  b.  Because  we  have  a  homeomorphism  ker(/z)  ~  [0,  1]  between  the  relative  Scott  topology  and  the  Euclidean 
topology,  we  have  proven: 

Corollary  3.10.  The  untimed  capacity  C  :  [0,  l]2  ->►  [0,  oo)  satisfies 

(i)  C(a,  b)  =  C(b,  a), 

(ii)  C(a,  b)  =  0  iff  a  =  b, 

and  the  sets  {y  e  [0,  11  :  C(x,y)  <  s}fors  >  0  form  a  basis  for  the  Euclidean  topology  on  [0,  1]. 

We  will  show  later  that  the  result  above  extends  to  timed  capacity.  But  first,  let  us  illustrate  the  importance  of  the  order 
theoretic  properties  possessed  by  capacity. 

3.2.  An  example  from  quantum  communication 

Let  be  the  state  space  for  a  two  dimensional  quantum  system.  Two  parties  communicate  with  each  other  as  follows. 
First,  they  agree  up  front  on  a  fixed  basis  of  M,  say  (|0),  10)},  which  can  be  expressed  in  some  fixed  basis  (|0),  1 1) }  as 

|0)  =  a|0)  +  b\\)  &  10)  =  c|0)  +  d|l) 

where  the  amplitudes  a,  b,c,d  are  all  complex.  The  state  |0)  is  taken  to  mean  ‘O’,  while  the  state  |0)  is  taken  to  mean  ‘1’. 
The  first  party,  the  sender,  attempts  to  send  one  of  these  two  qubits  |*)  e  (|0),  |0> }  to  the  second  party,  the  receiver.  The 
second  party  receives  some  qubit  and  performs  a  measurement  in  the  agreed  upon  basis.  The  result  of  this  measurement  is 
one  of  the  qubits  (|0),  |0) },  which  is  then  interpreted  as  meaning  either  a  ‘0’  or  a  ‘1’. 

We  say  some  qubit  because  as  |  *)  travels,  it  suffers  an  unwanted  interaction  with  its  environment,  whose  effect  on  density 
operators  can  be  described  as 


s(p)  =  E0pEl  +  E\pE\ 
where  the  operation  elements  are  given  by 


This  effect  is  known  as  amplitude  damping  and  the  parameter  X  e  [0,11  can  be  thought  of  as  the  probability  of  losing  a 
photon.  Thus,  the  receiver  does  not  necessarily  acquire  the  qubit  |*),  but  instead  receives  some  degradation  of  it,  describable 
by  the  density  operator  £(|*)(*|). 

The  probability  that  ‘0’  is  received  when  ‘0’  is  sent  is 

a  =  P(0|0)  =  -2\a\4p(X)  +  \a\2(X  +  2  p(X))  +  1-X 

while  the  probability  that  ‘0’  is  received  when  ‘1’  is  sent  is 

P  =  P(0|1)  =  2\a\4p(X)  +  | a | 2 (A  -  2p(X)) 

where  p( X)  =  —  1  +  X  +  —  X  >  0.  Thus,  each  choice  of  basis  defines  a  classical  binary  channel  (a,  ft).  Notice  that  the 

probabilities  a  and  (3  only  depend  on  \a\2  because  |c|2  =  |a|2  and  |b|2  =  |d|2  =  1  —  |a|2  by  the  orthogonality  of  |0)  and 
|0),  and  because  the  initial  expressions  for  a  and  turn  out  to  only  depend  on  modulus  squared  terms.  Because  the  basis  is 
fixed,  |a|2  e  [0,  1]  is  a  constant  and  we  obtain  a  function  x  :  [0,  1]  ->  I[0,  1]  of  X  given  by 

x(A)  =  [P(X)9a(  A)]. 

Let  us  now  establish  its  domain  theoretic  nature. 

Proposition  3.11.  The  trajectory  x  :  [0,  1]  ->  I[0,  1]  is  Scott  continuous. 
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Proof.  First  we  prove  that  a  (A)  >  /3(X),  so  that  we  know  x  actually  maps  into  the  interval  domain.  Because 
a(X)  -  /3(X)  =  4p(A.) |a|2 (1  -  |a|2)  +  1  -  A  >  0 

with  the  latter  being  nonnegative  because  |a|2  <  1  and  X  e  [0,  1].  To  prove  that  x  is  monotone,  we  show  that  a'(X)  <  0 
and  fi'(X)  >  0.  First  notice  that  p' (A)  <  1/2.  This  implies  that 

a'W  =  (1  -  |a|2)(2|a|V(A)  -  1) 

<  1  •  (2 p'(X)  -  1) 

<  0 

and  that 

/3'(X)  =  -2|fl|Va)(l-|fl|2)  +  |fl|2 

>  — |a|2(l  —  |a|2)  +  |a|2 

=  |a|4 

>  0 


which  shows  that  x  is  monotone  as  a  trajectory  from  [0,  1]  in  its  usual  order  to  I[0,  1].  Because  x  has  continuous  measure 
(with  respect  to  the  length  measurement),  x  is  Scott  continuous.  □ 

One  valuable  aspect  of  x  being  Scott  continuous  is  that  we  can  now  make  precise  the  connection  between  quantum 
information’s  intuitive  use  of  the  word  ‘noise’  and  information  theory’s  precise  account  of  it:  the  quantity  C(x(A))  decreases 
as  X  increases  i.e.  the  amount  of  information  that  the  two  parties  can  communicate  decreases  as  the  probability  of  losing  a 
photon  increases.  In  the  extreme  cases, 

x(0)  =  [0, 1]  &  x(l)  =  [|a|2,  \a\2] 


yielding  respective  capacities  of  1  and  0.  There  is  a  more  fundamental  idea  at  work  in  this  example  and  in  many  others  like 
it:  we  have  learned  about  capacity  by  only  examining  how  the  probabilities  in  the  noise  matrix  change,  and  this  more  than 
justifies  the  domain  theoretic  approach.  Imagine  what  would  happen  if  we  actually  tried  to  calculate  C(x(A))  explicitly:  we 
would  have  to  substitute  a  (A)  =  -2|a|4p(A)  +  |a|2(A  +  2  p(X))  +  1  -  X  for  a  and  /3(A)  =  2|a|4p(A)  +  |  a  | 2  ( A.  -  2  p(X))  for  b 
into  the  formula 


C(a,  b )  =  log2 


(aH(b)-bH(a )  bH(a)-aH(b ) 

2  a~b  +2 


and  then  seek  to  show  that  the  resulting  quantity  decreases  as  X  increases.  The  reader  still  unconvinced  about  the  merits  of 
the  domain  theoretic  approach,  or  who  believes  that  the  domain  theoretic  approach  is  ‘not  necessary’  is  more  than  welcome 
to  provide  an  alternative,  with  one  caveat:  any  alternative  approach  should  yield  new  results  the  way  the  domain  theoretic 
approach  employed  above  has. 


3.3.  Timed  capacity  as  a  measure  of  distance 


We  will  show  that  binary  timing  capacity  Ct  (a,  b )  is  a  distance  function  that  yields  the  Euclidean  topology  on  [0,  1].  First 
though  we  derive  an  equation  which  characterizes  the  capacity  of  a  binary  timing  channel  as  an  implicit  function  of  its  noise 
matrix  and  the  channel  times.  The  equation  is  an  interesting  result  in  its  own  right,  but  our  primary  motivation  for  deriving  it 
is  that  it  allows  for  derivation  of  the  symmetry  relations  satisfied  by  timed  capacity,  which  are  ultimately  needed  to  explain 
it  as  providing  a  measure  of  distance  on  the  unit  interval. 

Recall  from  our  discussion  earlier  that  the  noise  matrix  u  of  a  (2,  2)  timing  channel  is 


u  = 


where  a  =  P(0|0),  a  =  1  —  a  =  P(1|0),  b  =  P(0|1)  and  b  =  P(  1 1 1),  so  that  u  can  be  represented  by  a  pair  of  probabilities 
(a,  b).  The  capacity  Ct  (a,  b)  of  a  binary  timing  channel  (a,  b)  with  times  t  =  (ti,  t2)  and  a  /  b  is  given  by 
H(a)(b  -  \) +H(b)(\  -  a) 

- In (0(a,  b))  =  ln(2)  •  t\  •  Ct  (a,  b) 

a  -  b 

where  @(a,  b)  is  the  unique  solution3  of 

e-K/(a-b)xt2  _  (1  _  x)tl  =  q 

on  the  interval  [0,  1],  I<  =  (be  -  t2)H(a)  +  (t2  -  as)H(b),  s 
H(x )  =  -x  In  (x)  -  (1  -  x)  In  (1  -  x). 


t2  —  t\  and  H  is  the  base  e  entropy 


It  is  remarkable  that  the  dependence  on  @(a,  b)  can  be  eliminated. 


3  0  (a,  b )  is  not  the  capacity  achieving  distribution  x(a,  b ),  but  is  related  to  it  by  0  (a,  b)  =  (a  —  b)x(a,  b )  +  b. 
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Theorem  3.12.  The  capacity  of  a  (2,  2)  timing  channel  is  cp  >  0  iff  its  noise  matrix  u  =  (a,  b)  satisfies  a  /  b  and 

/V >  p(H(a)(b-l)+H(b)(l-a))/(a-b)  _) _ ,  (t ,H(a)-aH(b))/(a-b)  1  _  1 

W  6  2^r+  2'2‘r- 

Its  capacity  is  zero  iff  a  =  b. 

Proof.  First  notice  that  I<  can  be  written  as 

I<  =  t2[H(a)(b  -  1)  +H(I>)(1  -  a)]  +  t,(aH(b )  -  bH(a)). 

Let  us  abbreviate  <P(a ,  b)  to  0.  If  we  multiply  the  equation  for  cv  by  —  1  and  then  exponentiate  both  sides  we  get 

e-(H(a)(b-\)+H(b)(l-a))l(a-b)0  _  2~h cP 

so  raising  both  sides  to  the  t2  gives 

e-t2(H(a)(b--l)+H(b)O-a))/(a-b)0t2  _  2~W2Cp . 

Using  the  equation  which  defines  0  and  our  above  remark  about  I<,  we  see  that 

e-t2(H(a)(b-l)+H(b)(l-a))/(a-b)^t2  =  (\  -  <£)fi  eh  Wb)-bH(a))/(a-b) . 

Then  we  must  have 

_  (pyt  eh  (aH (b)—bH (a))/ (a—b)  _  2~ht2Cp 

which  gives 

(1  -  0)  =  2~t2 cPe(bH(a)-aH(b))/(a~b)' 

However,  if  we  solve  the  cp  equation  for  0 ,  then  we  see  that 

0  =  2~fl  cpe(H(a)(&-1)+H(&)(1-a))/(a-&)> 

Because  0  +  (1  —  0)  =  1,  we  get 

(H(fl)(i»-l)+H(ft)(l-fl))/(fl-ft)  .  _J_  ,  e(bH(a)-aH(b))/(a-b)  m  _J_  _  1 
2  2f2°p 

For  the  converse,  assume  the  noise  matrix  (a,  b)  of  a  channel  satisfies  (*).  Let  us  denote  its  capacity  by  kp  >  0.  We  need  to 
prove  that  kp  =  cp.  By  the  work  we  just  did,  we  know  that 

p(H(a)(b—l)+H(b)(t—a))/(a—b)  _J _ ,  (bH(a)-aH(b))/(a-b)  1  _  1 

'  2fifep  ‘  2f2fcp 

Now  notice  that  for  constants  a,  fi,h,t2  >  0,  the  function 
a 

gw  =  ^  ^ 

is  injective  since  x  <  y  =>  g(x)  >  g(y).  Thus,  for 

a  e(H(a)(ft-l)+H(ft)(l-fl))/(fl-ft)  >  o  and  /3  :=  eW(a)-aH(b))/(a-b )  >  0 

we  have  g(/<p)  =  g(cp)  =  1  and  hence  kp  =  cp.  □ 

Let  us  comment  briefly  on  what  makes  this  result  so  surprising.  If  we  look  at  the  cp  equation  and  the  0  equation,  it  is 
clear  that  we  can  solve  each  one  of  them  for  0,  and  then  equate  the  resulting  expressions  to  obtain  a  new  equation  for  (a,  b). 
But  that  equation  will  depend  on  1  —  0.  The  last  theorem  shows  that  this  dependence  can  actually  be  eliminated!  There  are 
two  important  cases  to  emphasize: 

Example  3.13.  The  binary  symmetric  channel  In  this  case,  a  =  b  =  1  —  p  and  b  =  a  =  p  with  p  being  the  probability  of  a  bit 
flip.  The  equation  relating  capacity  cp  to  noise  is  then 

H(p)  =  ln(2  ~hCp  +2  ~t2CP). 

Notice  that  in  the  untimed  case,  when  t\  =  t2  =  1,  we  obtain  the  well-known  result  that  the  capacity  of  a  binary  symmetric 
channel  is  1  —  H2(p),  where  H2  is  the  base  two  entropy. 

Here  a  simulation  of  the  binary  symmetric  channel  might  involve  inputting  an  image,  flipping  p  percent  of  its  bits,  and 
then  displaying  the  degraded  image. 
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Example  3.14.  The  timed  Z  channel.  In  this  case  we  have  b  =  0  and  d  =  1.  Then 


is  the  equation  relating  capacity  to  noise. 

Given  a  vector  t  =  (tu  t2)  of  times  for  a  (2,  2)  timing  channel,  let  us  write 
rev(t)  :=  (t2,  ti) 

for  the  vector  whose  times  are  those  of  t  in  reverse  order. 


Proposition  3.15.  The  following  symmetry  properties  hold  for  binary  timing  channels: 

(i)  Ct(a,b)  =  Ct(b,a) 

(ii)  Q  (a,  b)  =  Crev(t)(d,  b) 

Proof.  Let 

bH(a)  -  aH(b )  a H(b)  -  bH(a ) 

f(a ,  b)  = - - -  &  g(a,  b)  = - - - . 

a  —  b  a  —  b 

Then  f(a,  b)  =  f(b,  a)  and  g(a,  b)  =  g(b,  a).  These  properties  along  with  the  previous  characterization  of  capacity  give  (i). 
In  addition,  we  also  have  g(a,  b)  =f(b,  a )  and  g(b,  a )  =f(a,  b),  which  will  then  give  (ii).  □ 

By  these  two  results,  we  then  see  Ct  (a,  b)  =  CreV(t)(S,  a)  and  thus,  without  loss  of  generality,  we  can  always  assume  that 
a  timing  channel  satisfies  a  >  b  and  t2  >  G,  if  all  we  are  interested  in  is  its  capacity.  Thus,  the  restriction  to  nonnegative 
binary  channels,  which  we  made  in  [6],  can  be  made  in  the  timed  case  as  well.  This  may  be  quite  a  useful  thing  to  know. 

Timed  capacity  does  not  satisfy  the  triangle  inequality,  for  the  simple  reason  that  the  triangle  inequality  fails  in  the 
untimed  case  (G  =  t2  =  1). 

Theorem  3.16.  For  fixed  times  ti,  t2  >  0,  the  timed  capacity  Ct  :  [0,  l]2  —>  [0,  oo)  satisfies 

(i)  Ct  (a,b)  =  Ct(b,a), 

(ii)  Ct  (a,  b)  =  0  iff  a  =  b, 

and  the  sets  {ye  [0,  11  :  Cf  (x,  y)  <  s}  for  £  >  0  form  a  basis  for  the  Euclidean  topology  on  [0,1]. 


Proof.  By  the  Euclidean  continuity  of  Ct,  all  timed  capacity  open  balls  are  Euclidean  open.  Given  any  Euclidean  open  ball,  it 
contains  a  timed  capacity  open  ball  around  any  point  it  contains  since 


Q  (a,  b)  > 


C(a,  b ) 

max{ti,  t2 } 


and  now  we  see  that  the  timed  capacity  open  balls  form  a  basis  for  a  topology  and  that  that  topology  must  be  the  Euclidean 
topology  on  [0,  1].  The  other  properties  follow  from  Proposition  3.15.  □ 


4.  Capacity  in  place  of  distance? 

We  consider  a  few  cases  where  one  can  replace  Euclidean  distance  by  capacity. 


41.  A  reformulation  of  measurement 

By  simple  rescaling,  any  map  pt  :  D  ->  [0,  oo)  can  be  assumed  to  map  into  [0,  1].  To  determine  whether  a  map 
/x  :  D  — >  [ 0,  1]*  is  a  measurement  requires  consideration  of  sets  of  the  form 

Me(x)  :=  {y  e  D  :  y  n.  x&  |/xx  -  /zy|  <  e], 

However,  because  capacity  also  yields  the  Euclidean  topology,  we  can  use  the  sets 

/ie(x):={y€D:yCx& C(/xx,  pty)  <  s}. 

We  could  have  put  any  ‘semimetric’  in  place  of  Euclidean  distance,  but  we  put  capacity  because  it  has  an  intriguing 
interpretation:  objects  x  and  y  serve  to  define  probabilities  ptx  and  pty  which  define  the  noise  matrix  for  a  binary  channel; 
the  capacity  of  the  resulting  channel  is  a  measure  of  distance  between  x  and  y.  That  is,  the  amount  of  information  that  can 
be  communicated  from  one  point  in  space  to  another  provides  a  measure  of  distance  that  is  capable  of  yielding  the  space’s 
topology.  This,  for  instance,  is  what  happens  with  the  unit  interval. 


K.  Martin  /  Theoretical  Computer  Science  405  (2008)  75-87 


87 


4.2.  Topology  of  manifolds 

The  restriction  of  capacity  from  the  unit  square  to  (0,  l)2  is  a  distance  function  which  yields  the  Euclidean  topology  on 
(0,  l).Forx,  y  e  (0,  l)n,  the  function 

n 

C  (x,y)  =  ^2c(Xi,yi) 

i=  1 

yields  the  Euclidean  topology  on  (0,  l)n  ~  Rn.  Thus,  a  pair  of  n  tuples  is  understood  as  a  defining  an  n  tuple  of  binary 
channels  (x*,  yf),  and  so  the  topology  of  a  manifold  can  be  understood  as  stemming  from  this  function  (locally).  In  particular, 
this  is  true  of  spacetime. 

5.  Questions 

When  is  binary  timed  capacity  monotone? 

For  further  reading 

[1],  [10],  [11],  [12]. 

Appendix.  Topology 

Nets  are  a  generalization  of  sequences.  LetX  be  a  space. 

Definition  A.  1.  A  net  is  a  function/  :  I  ->  X  where  I  is  a  directed  poset. 

A  subset  J  of  I  is  cofinal  if  for  all  a  e  /,  there  is  ft  e  J  with  a  <  ft. 

Definition  A.2.  Let  /  :  /  ->  X  be  a  net  with  a  function  g  :J  ->  I  such  that  J  is  a  directed  poset  and 

•  For  all  x,  y  e  J,  x  <  y  ==>  g(x)  <  g(y) 

•  g(J)  is  cofinal  ini. 

The  function/  o  g  :  J  —>  X  is  called  a  subnet  off. 

Definition  A.3.  A  net  /  :  /  — >  X  converges  to  x  e  X  if  for  all  open  l/cx  with  xeU,  there  is  a  el  such  that 
a</3^f(/3)eU 
for  all  pel.  This  is  written/  ->  x. 

The  following  are  all  standard  results  of  basic  topology: 

Theorem  A.4.  (i)  If  (x2)  is  a  net  that  converges  to  x,  then  so  does  each  subnet  of  (x,). 

(ii)  If  we  have  nets  x2  ->  x  e  X  andyi  ->  y  eY,  then  the  netZi  =  (x2,  y2)  -^(x,]/)gXx  Y. 

(iii)  A  function  f  :  X  — ►  Y  is  continuous  iff  for  each  xeX  and  each  net  (x2)  that  converges  to  x,  the  net  (f(Xi))  converges  tof(x). 

(iv)  A  space  X  is  compact  iff  every  netf  :  I  —>  X  has  a  convergent  subnet. 
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